Large-Scale Text Similarity Computing with Spark
نویسندگان
چکیده
منابع مشابه
Towards Large Scale Environmental Data Processing with Apache Spark
Currently available environmental datasets are either manually constructed by professionals or automatically generated from the observations provided by sensing devices. Usually, the former are modelled and recorded with traditional general-purpose relational technologies, whereas the latter require more specific scientific array formats and tools. Declarative data processing technologies are a...
متن کاملLarge-Scale Online Expectation Maximization with Spark Streaming
Many “Big Data” applications in Machine Learning (ML) need to react quickly to large streams of incoming data. The standard paradigm nowadays is to run ML algorithms on frameworks designed for batch operations, such as MapReduce or Hadoop. By design, these frameworks are not a good match for low-latency applications. This is why we explore using a new, recently proposed model for large-scale st...
متن کاملLarge Scale Sentiment Analysis on Twitter with Spark
Sentiment analysis on Twitter data has attracted much attention recently. One of the system’s key features, is the immediacy in communication with other users in an easy, user-friendly and fast way. Consequently, people tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide diversity of topics. This amount of informat...
متن کاملLarge-Scale Similarity Joins With Guarantees
The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they are based on r...
متن کاملLarge Scale Text Analysis
We take an algorithmic and computational approach to the problem of providing patent recommendations, developing a web interface that allows users to upload their draft patent and returns a list of ranked relevant patents in real time. We develop scalable, distributed algorithms based on optimization techniques and sparse machine learning, with a focus on both accuracy and speed.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Grid and Distributed Computing
سال: 2016
ISSN: 2005-4262,2005-4262
DOI: 10.14257/ijgdc.2016.9.4.09